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BAYESIAN FREQUENTIST HYBRID INFERENCE^ 

By Ao Yuan 

Howard University 

Bayesian and frequentist methods differ in many aspects, but 
share some basic optimality properties. In practice, there are situa- 
tions in which one of the methods is more preferred by some criteria. 
We consider the case of inference about a set of multiple parame- 
ters, which can be divided into two disjoint subsets. On one set, a 
frequentist method may be favored and on the other, the Bayesian. 
This motivates a joint estimation procedure in which some of the 
parameters are estimated Bayesian, and the rest by the maximum- 
likelihood estimator in the same parametric model, and thus keep the 
strengths of both the methods and avoid their weaknesses. Such a hy- 
brid procedure gives us more flexibility in achieving overall inference 
advantages. We study the consistency and high-order asymptotic be- 
havior of the proposed estimator, and illustrate its application. Also, 
the results imply a new method for constructing objective prior. 

1. Introduction. In statistical practice, usually either the frequentist or 
the Bayesian method is used in parametric inference. Often the choice of 
methods is subjective. The two methods share some common basic asymp- 
totical properties, which have been studied extensively. The Bernstein-von 
Mises theorem, for example [23, 30], states that in broad cases the Bayes 
and frequentist inferences are equivalent: the two estimators are close, and 
the posterior distribution around its mean is close to the distribution of the 
maximum likelihood estimate (MLE) around the true parameter — both are 
asymptotically normal with mean zero and the same asymptotic covariance 
matrix. However, the two methods are different in many other aspects, each 
of them has its own advantage(s) in some situations. In application they have 
received different appreciations for various reasons. Although all admissible 
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solutions, including the MLE, to a decision problem can be formulated as 
Bayesian [37], the two methods are regarded as different in concept, theory 
and history; so are they in this paper. 

The Bayesian has appreciated steady growth partially due to the devel- 
opment of computation facilities, but in practice the main stream statistical 
tool is still frequentist. Efron [12] summarized the main reasons for this as 
the ease of use, modeling and objectivity. Lindley [25] gave a broad review 
of the present position of Bayesian statistics. The two schools mainly favor 
their own method, and the practitioners often have to choose one of the 
methods and ignore the other. In practice, there are situations in which one 
of the methods is more favorable than the other by some criteria. Thus in 
inferring multiparameters, it may happen that on part of the parameters, 
the frequentist method is preferable, while on the other part a Bayesian pro- 
cedure is more appropriate. A practical example comes from our analysis of 
genetic data, in which the means of traits underlining each genotype are well 
studied in the literature. The sound prior knowledge prefers a Bayesian on 
this subset of parameters, while the mixing proportions and variances for 
the subdistributions are new in the investigation, and the MLE is favored on 
these parameters. This motivates a joint operation of the two methods on 
different parts of the parameters in the same model. Such hybrid inference 
will give us more flexibility than using either methods alone in achieving 
overall advantage. In this paper, we propose a hybrid estimator and study 
its consistency and asymptotic high-order behavior, and we illustrate its ap- 
plication. Also, using the high-order expansions, we considered a new type 
of second-order matching prior in the objective Bayes context. 

There are some combined Bayesian and frequentist methods [1] and com- 
promises between the two [16]. Our method here is not such combination 
nor compromise, not the quasi-Bayesian (pseudo- or semi-Bayesian), nor the 
empirical Bayes in the literature. The profile likelihood approach [28, 32] also 
divides the parameters into two parts, one of interest and the other nuisance, 
often of infinite dimensional. Fixing the parameters of interest, the nuisance 
parameters are eliminated by maximization, then the parameters of interest 
are estimated based on the resulting profile likelihood. This approach and 
ours have some operational similarity, but are different in nature. [7, 8, 24] 
study Bayesian method based on profile likelihood. They obtain the frequen- 
tist inference about parameters of interest by sampling from the posterior 
distribution of the semiparametric profile likelihood. They show that the 
estimator is of high-order accuracy to the corresponding frequentist 's, and 
can have advantages on small samples. They further studied the case that 
the nuisance parameter may not have root-n convergence rate. Their method 
and ours share more in common than the others. Although not identical, the 
two can be the same in some cases. This point will be made clear after we 
introduce the notations in the next section. Shen [33] studied the inference 
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of parameters of interest by marginalizing the posterior distribution. Berger, 
Liseo and Wolpert [2] studied a method of ehminating nuisance parameters 
by integrating the Ukehhood over them with a prior, and the parameters of 
interest is estimated by maximizing the resulting likelihood. 

In Section 2, we describe our method and study its consistency and asymp- 
totic high-order behavior. For the latter, we first extend the existing high- 
order results of Bayes estimate and MLE to multivariate case, then based on 
the results, obtain the expansion of the hybrid estimator, which depicts the 
interplay of the Bayes and MLE components in each order. We show that 
the Bayes estimator, MLE and the hybrid estimator are first-order equiva- 
lent, asymptotic normal and efficient. In Section 3, we discuss implications 
from the results obtained, the evaluation of high-order terms, and advan- 
tages and weaknesses of MLE and Bayes, so as to have some references in 
choosing which method or how to hybridize them in practice. In particu- 
lar, we derive a new method for constructing objective priors by matching 
high-order terms in expansions of the Bayes estimator and the MLE. We 
then illustrate the application of this method and the construction of the 
second-order objective prior in this sense by some examples. Also, a simple 
example is given in which neither the MLE nor Bayes estimator is consistent 
while the hybrid estimator is. The used regularity conditions and technical 
proofs of theorems are given in the Appendix. 

2. The method. Let fi-\0) be a given density function of the data dis- 
tribution and 6 £ & C R'^ {d > 1) be the parameter of interest. Partition 
the parameter as 6 = {a,P) E (A,ri) C {R'^^ , R'^'^). Assume that according 
to some criteria, on part of the parameters a, the Bayesian method is pre- 
ferred, and on the other part (3, the MLE. This motivates the operation of 
the two methods on different parts of parameters in the same model simulta- 
neously. We call such joint procedure of the two methods on the parameters 
in the same model a hybrid estimator, which is the goal of this study. 

Specifically, let x" = (xi,...,x„) be an i.i.d. sample with likelihood 
/(x^Iq;,/^) = nr=i fi^i\*^i vr(Q;) be the prior density for a and 7r(Q;|x'", /3) oc 
/(x"|q;, f3)TT{a) the posterior density of a given the data and (3. Let D be the 
decision space for inferring a {T> = A for estimation of q), d(x") gT> a de- 
cision rule, P^(d(x"'), q) the loss function, R{d,a\f3) = i?(c^^^)Vl^(d(x"), ct) 
the risk on a for given (3, R{d\f3) = J R{d, a\P)'Tr{a) da the Bayes risk on 
a for given (3 and 



the posterior risk for inferring o: given /3. Then for fixed /3, the Bayes deci- 
sion for a is d*(-) = d*(-|/3) = arginf^gA -R((l|/3), and from Bayes inference 
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theory 

d*(x") =-arginf i?(d|x",/3) = argmf / VI/(d(x"), a)/(x"|Q, /3)7r(a) da. 

deA deAJ 

The right-hand side above is the generahzed Bayesian estimator of a for 
fixed /3. 

In this hybrid inference, we infer a by the generahzed Bayesian rule for 
each fixed j3 and at the same time infer j3 by the frequentist MLE, that is, 
we are to find On = {an,l3n) — such that 

(2.1) (q:„,;9„) = arginf sup / VF(d(x"), Q;)/(x"|Q;,/3)7r(a) da 

(d,/3) J 

is the joint optimizer over (d,/3) € (A, $7). 

Remark 1. By imposing a 0-1 loss and constant prior on /3, (2.1) can 
be formulated as a full Bayesian solution (as in the proof of Theorem 2.1). 
Thus, {6Ln,0n) generally exists and is locally unique. 

In the above {an,Pn) is jointly a generalized Bayes estimator and a MLE 
of (a,/3). Here q;„ is not identical to the Bayes estimator based on profile 
likelihood such as in [8]. The latter is first eliminating the nuisance parameter 
f3 by maximizing the likelihood over it, along some least favorable curve, to 
get (3 = P{ct) = argsup^ /(x"|q, (3), then computing the Bayes estimator q:„ 

of a based on the profile likelihood /(x"|q;) = f{x^\a, (3(a)), that is, Q;„ = 
arginfd / M^(d(x"), Q;)/(x"|Q;)7r(Q;) da = arginf^ / VF(d(x"'), a) sup^/(x"|Q!, 
(3)'K{a) da. It is seen that generally q;„ 7^ 0.^, and they can be equal under 
some fair conditions, such as that the integrand in (2.1) can be dominated, 
with respect to (x",/3), by some integrable function in a. 

In the following, we study the consistency of the hybrid estimator. As 
Bayes estimator and the MLE are generally first-order equivalent, their com- 
petition goes into the asymptotic high-order terms. We investigate the high- 
order asymptotic behavior of (2.1); this will give us fiexibility in choosing 
which method to use on which parameter component (s) to achieve high- 
order advantage. 

Consistency of the estimator. The study of the consistencies of Bayes 
estimates, MLE and their relationships has a relatively long history [4, 11, 
22, 34, 36], among others. Doob [11] established strong consistency of Bayes 
estimators under very general conditions, and there is some speculation that 
conditions for Bayesian consistency might be found which are weaker than 
those for the MLE. Under some basic assumptions, Strasser [34] showed that 
any conditions for the convergence (a.s.) of MLE assert the concentration 
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(a.s.) of the posterior distribution to the true parametric value. This does 
not directly imply that conditions for Bayesian consistency are weaker since 
posterior concentration to the true parameter is not equivalent to the con- 
sistency of Bayes estimate. The latter also depends on the loss. There are 
examples in which one of the estimator is consistent while the other is not. 
However, for multiple parameters using a hybrid estimator may overcome 
possible difficulty in using one method alone. We will give such an example 
later. Let Oq = {cxq,Pq) be the "true" parameters generating the data under 
the specified model. In the following, we give the consistency of the hybrid 
estimate using a method similar to that in [5]. Generally, the loss W{d,a) 
has the form VF(||d — q:||) = W{d — a). To avoid confusion, we will use W for 
any of these functional forms. Let d = dim[9) = dim{a) + dim{(3) = di + d2- 

Theorem 2.1. Assume conditions (A1)-(A9) in the Appendix, W{-) 
satisfies W{0) = 0, is strictly increasing and continuous in a neighborhood 
of 0. Then, as oo we have 

(q:„,^„) ^ (qo,^o) (a.s.). 

High- order asymptotic behavior. High-order asymptotic expansions are 
used to assess estimators when they have similar lower-order behavior. In 
[9, 17, 20, 26] (among others) such expansions of Bayes estimate and MLE 
were obtained, so were their densities and related quantities in the one- 
dimensional case. The results in [17] are more suitable to our case. Here, we 
first generalize the results there to the multidimensional case, then use them 
to get the expansion of the hybrid estimator. 

We first give a multivariate generalization of the asymptotic expansion of 
maximum posterior density estimator, which is given by 

On = argsup7r(0|x") = argsup/(x"|0)7r(0). 
e e 

We introduce the following notation. For an integer vector i = (zi, . . . , i^) 

with ij >0 (i = l,...,d), denote |i| = J2'j=iij, ^ = g^n^llggM ' ^'^'^ ^^"^ ^"^^ 

g{-) > 0, define log5(-) = if g{-) = 0. Denote 1 = (1, . . . , 1/ of length d, 
= (0,...,0)' of length d, 

L(x|0) := (Li(x|0), . . . , Lrf(x|0))' = J- log/(x|0), . . . , J- log/(x|0)y, 
LKx|0) =(^^Li(x|0),...,^L,(x|0)j, 

p{e) := {p,{6), p,{e))' = J- iog^(0), . . . , J- iog^(0)y , 
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1 " 

Si(0) =^5:Li(x,|0), 
1 

Ai(0) =^5^(Li(x,-|0)-ii;0Li(xi|0)), 



and set Si = Si(0o), ■^i = ■^i(^o) and Ei = Ee^^'Li{-i^l\OQ)■ For vector H = 
{Hi, . . . , Hd)' and integer vector i = (ii, . . . , i^), define H' = {H^^ , . . . , H^/)' , 
(H') = n?=i and i! = n?=i ^j!- For a = (ai, . . . , o^)' and b = . . . , bd)' , 

define a+b = (ai + 6i, . . .,ad + bd)', ab = (ai6i, . . .,adbd)' and (ab) = nf=i Oi^i 
Denote ej = (0, . . . , 0, 1, 0, . . . , 0)', the d-vector with jth element be 1 and 
the others be zeros. For nonnegative integers r < s and nonnegative inte- 
ger d- vectors 1 and i, the notation (r, i) stands for the collection of all 
nonnegative integer d- vector sets {(1^, . . . , is)}, 

(r, s, 1, i) = <^ (i^, . . . , is) : ^ Dit, = 1, ^ it, = i L 

v=r v=r ) 

Denote I the Fisher information, and its inverse, evaluated at Oq. 

Theorem 2.2. Under conditions (B1)-(B7) in the Appendix, we have 

fc— 1 

V^iOn - 0o) = E ""'/'H, + Op(n-^-/2)^ 

r=0 

where the term Op{n~^/'^) is in the sense of [17], and the 's are d-vectors 
of polynomials in the Ai 's of degree r + 1, their coefficients are polynomials 
in the Ei 's, |i| = 2, . . . , r + 1, in and the 's, given by (0 <r < k — 1) 



Hn = I~^A 



0, 



H. = i-^ E E PiE E n^ + E^iE E 

s+t=r \|i|=t_l |l|=s(0,sl,i)«^=0 \i\=t |l|=s{0,s,l,i)f=0 



+ E EiE E 

|i|=t+l,t>0 |l|=s(0,s,l,i)f=0 , 

In the above set 7r(-) to constant, then On is the MLE, and we have 

k—l 

V^iOn - 0o) = E n-'-^^K + Op(n-'/'), 

r=0 
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where Hq = Hq, and for 1 < r < — 1, 

s+t=r\|il=t ll|=s(0,s,l,i)«=0 '"■ 

\i\=t+l,t>0 |l|=s(0,s,l,i)f=0 ^" / 

Next, set 9 = a, then On = On is the Bayes estimate of 0. To get the 
corresponding expansion for ^/n{On — Oq), we need the fohowing notation. 
Let l{x\0)= log fix\0), QiO) = \ogTr{0), 

in 1 

= = Y.ihix,\0) - Eehi^ilO)), 

and set Si = 5i(0o)) = ^ii^o) and £{ = Eggli{X\OQ). We make the conven- 
tion that for nonnegative integer vectors i = {ii, . . . , id)' and j = (j'l, • • • , jd)', 
the notation j — i impHes j > i, that is, jr^ir {r = 1, . . . ,d). 

Theorem 2.3. Under conditions (Bl)-(BIO) in the Appendix, we have 

fc-i 

V^iOn - 0o) = E n^^'^^Gr + Opin-"/^), 

r=0 

where Go = Hq and for 1 <r < k — 1, 

r—1 m m 

Qr = Mo,r + E E E E n iQv)/'^v\ 

m=l |i|=l |l|=m (l,r?i,l,i) f=l 

+ E i!Mi,oE E I[i^v)nvl (note Qi = Mo,i), 

|i|g(3,r) |l|=r(l,r-l,l,i)t)=l 

Mj,. = ({^(a)}I)-i E (0<|j|</c-l), 

|i|G(2(lAr)+|j|,3r+|j|> J- 

where {<T(a)} = diag((T(ai), . . . , ^(a,^)), (7(0^) is i/ie Orth marginal moment 

of Or with ~ iV(0,I-i), ^jj^ = *P(0) [*i := *|°^(0)] is d-vector of mul- 
tivariate normal moments associated with the loss [with components of the 
form a{i) as given in the proof], {a,b) is the set of odd integers s with 
a< s <b and Ni r 's are random variables given in Lemma 1 in the Appendix. 
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For general loss functions instead of the one in (B9), the Qr 's have more 
terms in more involved forms, and are outlined at the end of the proof. 



Now, based on the expansions of the MLE and Bayes estimate, we give 
asymptotic expansion for the hybrid estimator ^/n{On — Oq), which depicts 
the interplay of the two components in each order. Denote = (I*'')i<jj<2 
and a= (a'^,a2)' as the componentwise notations corresponding to the two 
sets of parameters {a,f3). 



Theorem 2.4. Assume conditions (B1)-(B5), and (B6)-(B10) (with 6, 
and a replaced by a, A and ai), in the Appendix, then 

otn - ao 



y/n{9n -6o) = ^/n 

k-1 



where [^) (0 < r < A; — 1) are given by 

gr \ _ T-1 V- / V- A . V- V- TT W ^ g 



s+t=r\ji|^^ |l|=s(0,s,l,i)n=0^' \ ^ 

+ E E n- 

|i|=t+l,t>0 |l|=s(0,s,l,i)?^=0 



l\\h 



g. ^ 



(r = l,...,A;-l) 



and 

s+t=r\i\=t-l ^ ^ |l|=s(0,s,l,i)''=0 

pj = p^(q;o) is the first di components of pj, 's as in Theorem 2.2 with 
Pi = (pp 0)' and qr = qr(^o) is the di- dimensional version ofQr in Theorem 
2.3 with replaced by 1^^ . 



For general loss functions rather than that given in (B9), the results are 
analogous. Below we give the first three terms in the expansions for each 
estimators. 
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Fact, (i) For the MLE in Theorem 2.2, the first three terms are 

\|i|=l |i|=2 / 

= I-^ f E ^i(HSVi! + E ^i(Hr) + E Ei(HgVi! 

V|i|=2 |il=l li|=3 

+ E Ei+j((HS'H°j) + (H°jHr))/2). 
|iMj|=l / 

(ii) For the maximum posterior estimator in Theorem 2.2, the first three 
terms are 

Ho = Hq, Hi = HJ + I ^Pq, 

H2 = i"^ f E pii^h) + E ^i(Hi,)/i! + E ^i(Hi) + E Ei(Hi,)/i! 

V|i|=l li|=2 li|=l |i|=3 

+ E Ei+j((Hi,Hi) + (4Hi))/2y 
|iUj|=l ^ 

(iii) For the Bayes estimator in Theorem 2.3, the first three terms are 
Go = Hg, Gi = H^+I-Vo + Qi, 

Qi = Mo,i = ({^(a)}I)-iE*i^' 

|i|=3 

G2 = H2 + Q2, Q2 = Mo,2+ E Mi,i(Qi), 

|i|=i 

Mo,2 = iwi^mr' E ^ f^i + E Ao)j)) *i, 

|i|=3 • V |j|=l / 

j>i,|j|=3 V |1|=1 / 

(iv) For the hybrid estimator in Theorem 2.4, the first three terms are 

So^-H°, g =H?+ |i|=3 



^ Po 
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where = ^(a-^-ii (a')), a ~ (0, I^i), and ^Tji = ^0,[0/(xi|a, /3o)]|«=c 

\ lil=l / 

where H* is in which H° is replaced by (g'., h^)' (r = 0, 1) and q2 is Q2 
with (I,a,*i,£:i,5i, Ao) replaced by {P\8Li,^l£l,6l,Al). 

Remark 2. Since Go = Hg = (go,ho)' = I~^Ao, the Bayes estimator, 
MLE and hybrid estimator are asymptotically first-order equivalent, normal 
and efficient. 




Computation consideration. Although in some cases {a.n,f3^) has closed 
form, the solution of (2.1) generally may not. Denote Gn(d,/3) = / VK(d(x„), q) x 
f{xn\ a, P)Tr (ex) da. Since 



sup 
13 



irifG„(d,/3) 

d 



< inf supG„(d, /3) < inf 

{d,/3) 



sup G„(d,/3) 

f3 



and the "=" signs do not always hold, so generally 



arg sup 
13 



iiifG„(d,/3) 

d 



/ (q;„,/3„) = arginfsupG„(d,/3) / arg inf 



sup Gn{d,f3) 

13 



However, if arginfj G„,(d,/3) does not depend on /3, then (q!„,/3„) = 
sup^[infd Gn(d,/3)]. Similarly, if argsup^Gn(d,/3) does not depend on d, 
then {an,^n) = infd[sup^Gn(d,/3)]. 

When (oLniPn) is not directly computable, an iterative procedure of it 
can be formulated by using the Newton-Raphson method. 



3. Implications and examples. The results obtained in Section 2 imply 
a new method for the construction of objective prior. They can also can be 
used to assess high-order behavior of the three estimators and applied to 
practical problems. Below we discuss these issues with some examples. 

Implication for objective Bayes. In the objective Bayesian context, the 
prior is selected by some objective rule instead of subjective choice. Such re- 
sults include the uniform prior, the reference prior [3] and the noninformative 
prior [19]. Jeffreys' general rule for such a prior is vr (0) oc |I(0)|^/^ Under 
some regularity conditions and without nuisance parameters, the reference 
prior coincides with Jeffreys' prior. The coverage matching prior is one that 
the posterior probability matches the corresponding frequentist probability 
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with high accuracy. Authors [27, 35, 38], among others, studied priors with 
second-order probabihty matching. A comprehensive review of this area can 
be found in [21]. 

Here we use a similar idea to that of coverage matching to select prior 
such that the Bayes estimate and MLE match for high-order terms in their 
expansions. We say that a prior 7r(-) on (or on a) is rth order expansion 
matching prior for Bayes estimate (or for the hybrid estimator) if the first 
r terms in its expansion under 7r(-) match those of the MLE, that is, under 
the notation of the last section, 

Gi = H° [or (g^,h^' = H°], i = 0,...,r-l. 

As Go = Hq = (gQ,hQ)' under the conditions of Theorems 2.3 and 2.4, any 
prior is automatically Ist-order matching. In the above equations, all quan- 
tities involving 6q in their definition should be replaced by (or a), so 
these equations are a set of differential equations of order r — 1 for the prior 
density as a function of 6 (or a). Especially, by the expressions in the Fact, 
a second-order matching prior vr(0) in this sense should satisfy 

X 1 51og7r(0) , , g.(0) 

i|=3 

To solve 7r(-) from the above partial differential equations, denote h(9) = 
-Wi^)}-' E|i|=3 = (6i (0), . . . , bdiO))', 6-k = iOi,..., Ok^i^Ok+i, . . . , 

Od)' , then it is seen that the solution exists if and only if 

dbm_dbj{ei 

which is equivalent to that there are functions Vk{0-k) {k = 1, . . . ,d) such 
that the following set of equations of indefinite integrals hold 

J hi{e)de, + vi{e^i) = j hj{e)dej+v^{e_j) ii<i<j<d), 

then for any k {1 <k < d), up to constant, 

log7r(6>)= J bk{e)d9k + Vk{e.k). 
Notationally, we denote the solution as 

7r(0) oc exp J {^(a)}-i ^ • 
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Similarly, for the hybrid estimator, a second-order matching prior 7r(-) on 
a should satisfy 

which is a set of equations in dTr(a)/da, with /3]^ as a hyper parameter. As 
in Example 1 below, when I^^ = 0, I^^ is independent on components of a, 
and ai = 21i, we will have Tr{a) oc \I^^{cx, (31)1^^"^ . 

The S 's) are d-vector functions of the o"(i)'s which can be found in 
various sources. Denote = {(Tst), we have cr((4, 0, 0)) = Sa^i, a"((3, 1,0)) = 
3criic7i2, o-((2,2,0)) = 0-110-22 + 2(7^2 and o-((2,l,l)) = aii<T23 + 2ai2(Ji3. 

Like the second-order probability matching prior, the second-order ex- 
pansion matching prior may not always exist nor be unique. For the former, 
Mukerjee and Ghosh [27] gave closed-form examples only under some special 
parameterizations. Below we give several examples of second-order expan- 
sion matching priors in natural parametrization. 



Example 1. When I = {(/ii(6'i), . . . ,Idd{Gd))] is in independent para- 
metric form and a = 21, we have {<T(a)} = I~^(0). Assume the conditions for 
exchange of expectation and differentiation, then for i = 3ej, some 
j, £i = -dljj{9j)/dej, *i = 3/~.2(0j)ej. For iy^Sej, some j, = 0. So 

Wi^)}-' E|i|=3 = -iC^AlH^i), • • • , ^Idlmr- It is easy to 

see that Vk{0^k) =Z]j^fc(l/2) J {dlii{0i) /dOi) / lu^Oi) dOi, and the second-order 
expansion matching prior is 

which is Jeffreys' prior. 



Example 2. Consider the data model N{fi,a'^) with parameter 6 = 
{n^a"^). We have = {(u^, 2(T^)}. In this case, '?(3,o) = ^{1,2) = 0) "^(2,1) = 
l/a4,£:(o,3) = 2/C76, *(2,i) = (^(3,1), (7(2,2))' = (0,2>)', *(o,3)(0) = ((j(1,3), 
a(0, 4))' = (0, 12cj8)'. If set a = (2, 2)', then {cr(a)} = I'^ and -{cr(a)}-i x 
Z]|i|=3 ~ ~(5/2)o'~^)'- It is easy to check that if we choose ui((T^) = 

— (5/2) /((j^)~^ dcj^ and 'y2(A*) = const., then the second-order expansion 
matching prior is 7r(0) = 7r{iJ,,a'^) oc (ct^)~^/^. In contrast, Jeffreys' prior in 
this case is 7r(^,cr^) oc (ct^)~^/^. 
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Example 3. Consider the bivariate normal model with parameters 6 = 
(/^i) A*2 5 , £72, /o), with p being the correlation coefficient. Suppose we make 
a hybrid inference with a Bayesian component on ct = {ai,a2,p) and want 
a second-order matching prior T:{ot). Here we need to replace by 
^j22 ji2^ in the partial differential equations. We have I^^ = and 

/ 2af 2p'alal p{l - p^^lx 

I^\e) = l'\c)=[ 2p^a\al 2a\ p(l - p>i , 

c2 _ 16-7p^ c2 _ 16-7p^ c2 _ 10p^+8p^-12p c2 _ 

(3,0,0) - 8(l-p2)a« ' ^(0,3,0) " 8(l-p2)(T« ' ^(0,0,3) " (l-p^)^ ' ^(2,1,0) " 

3p2 r.2 _ 3p2 p2 _ 3p^-5p r.2 _ 3p3-5p 

8(l-p2)CT^^^' '^(1,2,0) - 8(rVVM' (2,0,1) ~ 4(1-p2)2ct4' '^(0,2,1) " 4(l-p2)2(7^ ' 
C.2 _ (l+2p2)<72_p2(3+p2) _ (l+2p2)a2_p2(3^^2) _ p(l+p2) _ 

(1,0,2) - (l-p2)3o-2 ' ^^(0,1,2) - (l-p2)3cr2 ' ^^(1,1,1) ~ 4(l-p2)2<j2a2' 

^\ 0) = (12a8, 12^V6ai, 6p(l -p2y6y^ ^2^ ^ ^ (12^,2^2 ^ ^6^ I2a8, 6p(l - 
*2^^o^3) = (3/^(1 - p^falM^ - p^fal-iiX - p^)')', ^^.^.^ = 
{I2p^afal4{l + 2p^)afai2p{l-p^){l + 2p')ataiy, *fi,2,o) = (4(1 + 2^4)^4^4^ 
12p^alal2p{l - p^)af{af + 2pV|))', ^f,^,^,) = {6p{l - p')al2p{l - p^){l + 
2p2)a4ai, 2(1 + _ p2)2^4y^ ^2^ ^ ^ (2/,(l - p^)aUcrf + 2pV|), 6p(l - 

p2)af,2(l + p2)(l_p2)2^4y^ ^2^^^^^^ ^ ((l + 2p2)(l-p2)2^4^4^2(i_^2)2^2^2^ 

3p(l - p^alY, *2^,i,2) = (V(l - p')Vfai,2(l + _ ^2)2^4^3^(1 _ 

p2)3^2y^^2^^^^^^^(2p(l-p2)(l + 2p2)^4^2^2/,(l-p2)(l + 2p2)^2^4^4^2(i_ 

p2)2^2^2y^ T^j^g a2 = (2,2,2)', then {cr(a2)}-i = {(2^7^, 2ct|, (1 - p2)2)}-i. 
In this case, the second-order expansion matching prior Tr{a) cannot be 
evaluated in closed form, and numerical method, such as in [18], is needed. 

Result for second-order expansion matching prior 7r(0) = 7r(;Ui, /i2, cf, c^, 
can also be obtained similarly. In this case for |i| =3, there are 35 iSi's and 
^'i's each, the prior will not be evaluated in closed form and numerical 
method is needed. 



Evaluation of high-order behavior. Note that in the Hj's the Aj's are 
asymptotic normal random vectors, the pj's and Ei's are constant vectors 
and in the Gj's the iVj's are random variables determined by the Hj's. Thus, 
asymptotically, Hj, Gj, hj and gj converge in distribution to multivariate 
polynomials in normal vectors of degree i. For the MLE, H° is an zth form 
of normal vectors. Let Hj, H°, Gj, gj and hj be the weak limits of Hj, 
H°, Gj, gj and hj (i = 0, . . . , fc — 1). The first-order terms in the expansions 
often have mean zero, so their asymptotic behaviors can be characterized 
by their asymptotic variances. But high-order terms generally have nonzero 
mean, so using asymptotic variances alone as a criterion to evaluate their 
behavior is inappropriate. So we consider an asymptotic mean (bias) and 



14 



A. YUAN 



variance combined criterion. We say that 0„ is rth order preferred over 0^, 
if \\Ee,{iii)\\ + II Cov0o(H,)|| = \\Ee,{G^)\\ + || Cov0„(G,)ll = 0, . . . , r - 2), 
and ||E0„(H,_i)|| + ||Cov0„(H,_i)|| < ||^0„(G,_i)|| + || Cov0„(G,,_i)||, and 
vice- versa. 

From the Fact, we see that Go = Hq (hence Gq = Hq). Also Gi = Hi + 
Mo,i = Hf + I-Vo + Mo,i with Mo,i = ({cr(a)}I)-i E|i|=3 ^i-^i/f - Note 
is a random vector and Gi is plus a constant vector T~^pQ + Mo,i. Simi- 
larly, for the second-order term of hybrid estimator, its Bayesian components 
are those of the MLE plus a constant vector. Hence for the second-order be- 
havior, we only need to consider the expected asymptotic bias (EAB), and 
On is second order preferred over if 



\Ee,{ili)\\<\\Ee,{Gi] 



and vice versa. We have: 



Proposition. Lei Dj = £'0o[Lej(X|0o)Lo(X|0o)]; and andl-^ he 



'3 - ^Oo l-^j ' 

the jth row and jth column o/I~^, then 



(i) 

(ii) 
fiii) 



Ee,{Ul)=I-'(j:n,Ij'+j:B, 

\i=l i,i=l 



,1-^1-^17^ 



Ee,{Gi) = Ee.iUD+I-^Po + Mo,i, 
gi 



^00 



EeAilD+l Ii|=3 



By this proposition we are able to evaluate which estimator has second- 
order advantage under each specification of the likelihood model, prior and 
the loss, by the criterion of EAB. 



Example 4. We want to evaluate the overall and small-sample advan- 
tage of an estimator. We first evaluate the overall advantage for the model 
in Example 2 in four cases: (a) full MLE (/i„,cj^); (b) full Bayesian (/i„,(T^); 
(c) hybrid Bayes-MLE (/i.„,(7^); and (d) hybrid MLE-Bayes (/in,o'n)- We 
know that the four estimates are first order equivalent, we are to study their 
second-order behavior by the EAB criterion. Here, 








Do 



V 



0-n 



1 



« ; 



HYBRID INFERENCE 



15 



Eei+ei = (0,1/cro)', Eei+e2 — Ega+ei — (l/o'O'O)'' ^63+62 — (0,2/cJ^)'. 

For (a), we have Ee^ifll) = (0, -2crg + + 16a^°)'. 

For (b), if we use the prior vr(0) = 7r(/x)7r(c7^), ^ ~ A^(/ii, erf ) with (/Lti, erf ) 
known, 7r(cr^) = Aie"'^^"" , Ai > known. Then, po = -((/"o - /Ui)/cri, Ai)'. If 
we use the loss W{d,6) = {di - /i)^ + {d2 -cr^)^, then {cr(a)} = and for 
|i| =3, *i = (cr(i + ei), cj(i + 62))'. We have %o) = £(1,2) = 0, £{2,1) = ctq^, 
^(0,3) =2^To-^ *(o,3) = (a(l,3),a(0,4)y = (0,12a§y. Thus, Mo,i = {0^)', 
and £^0o(G?) = ((/ii - Aio)o-§/o-f , -2a^Ai + + 16cr(i°)'. Note the MLE is 
in closed form and the Bayesian is not. If one has good prior knowledge of 
6, that is, /.fi fiQ and Ai (^q"^, then i?g(,(G5') ~ ^'^^(H^'), so the Bayes 
estimator and MLE have similar second-order behavior, while the former 
has small-sample advantage and the latter is computationally preferable. 
One can even choose Ai « (ctq -|- 16crQ)/2, so -^^^(G^') ~ 0, thus the Bayesian 
has smaller asymptotic second-order bias than the MLE, but also lost its 
small-sample advantage on the estimate of . 

For (c), let 7r(/x) be as in (b), W{di,fi) = {di — ^)^. Then, = — 
^o)/o-i, {cr(ai)} = and for i = i = 3, ^} = 3a^ and £l = 0. Thus, 
£;0o(gi,hi) = ((^i-^o)cro/c^ii-2o-o + crg-hl6<T(i°). As in (b), if one has good 
informative prior on fi, then the hybrid estimator can have small-sample ad- 
vantage over the MLE, and they are compatible in second-order behavior 
and computation. If we do not have sound information on o"^, (c) often has 
better second-order property than (b). 

For (d), let TT{a^) as in (b), W{d2,cr^) = (^2 - cr^f- Then, pi = -Ai, 
{o-(a2)} = and for i = i = 3, *f = 12a^ and £f = 0. Thus E0^{hi,gi) = 
{0,-2Xia^ + a^ + lQcrl°). In this case fin is in closed form and is not. If we 
infer r = l/o"^ and use a Gamma prior on it, then {/ln,fn) has closed form, 
while the full Bayes estimator (b) is not. This has practical meaning when 
one seeks high-order accuracy in addition to computational advantage. As 
(b) is usually obtained by numerical approximation methods, such as Markov 
chain Monte Carlo, the inaccuracy of these methods is not easy to assess. 

Next, we discuss small-sample advantage. Consider the Xj's are i.i.d. mul- 
tivariate normal N{p.,Ct). Suppose we have good prior knowledge on Q 
in the form of a Wishart prior '7r(ri), but relative ignorance about /x. So 
we use MLE for /x and jointly, a Bayes estimate for ft. Since in this case 
argsup^G.„(d, p) = fj, = x does not depend on d, (i is just the Bayes solu- 
tion given /X = X, which can be evaluated numerically. 

On the other hand, suppose we have good information about /x summa- 
rized by the prior N(p,i,Q), but not enough knowledge on Q. Note here the 
prior has the same unknown variance matrix as that in the data distribu- 
tion, so that the Bayesian part has a closed-form solution. It can be checked 
that Theorem 2.1 still applies to this case. We want to use the good prior 
experience for /x, but a full Bayesian estimate for (/x, ft) is not easy, so we 
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estimate fl by the MLE. It is easy to see that for given fl, the posterior on 
fj, is N{n5l/{n + 1) + ^ii/{n + l),f2/(n + 1)). With loss to be either absolute 
error, squared error or 0-1 error on /i, fi^ is either the posterior median, 
mean or mode, which are all the same in this case and is independent of fi. 
So we have (/i„,n„) = (;^x+ ^ X;r=i(xi - An)'(xi + - 
An)'(/^i ~ An))) which is given in closed form, while the full Bayes estimator 
(/x„,ri„) is not. 

Example 5. Using existing results, we give an application of the hybrid 
estimator in which neither the full MLE nor full Bayes estimator works. Let 
x\a be the model in [13] with distribution P{A\a) and density function 

1 fx — \ 

f{x\a) = (1 -«)^(^/o(^-^^j +afi{x), a G [0,1], 

where /o(x) = (1 - |x|)x[-i,i] (x), fi{x) = X[_i,i](x)/2 and(5(-) satisfies 5(0) = 
1, < 5{q.) < 1 — a and 5{a) ^ as a ^ 1. Ferguson [13] shows that the MLE 
Un of a is not consistent; dn ^ 1 (a.s.) no matter what the true parameter cto 
is if 5{a) — > fast enough, in particular if 5{a) = (1 — a) exp(— (1 — a)~^ + 1) 
with c > 2. On the other hand, it is easy to see that for this model Doob's 
general conditions (as stated in [31]) for the consistency of Bayes estimator 
are satisfied as follows: (1) The measurable spaces {X ,B} of x and {A,Z//} of 
a are isomorphic to Borel field in a complete separable metric space; (2) For 
every A£ B, P{A\-) is Z^-measurable; (3) If ai ^ 02 there exists a set A£ B 
such that P{A\ai) 7^ P{A\a2)', (4) The prior 7r(-) has finite second moment. 
Then, the Bayes estimator d^ under quadratic loss is strongly consistent 
a.e. (tt). 

On the other hand, let yi, . . . , y„ i.i.d y|/3 ~ C/[0, l]x(/3 = 1) + C/[0, 2//?)x(l < 
P < 2) be the model in Example 2 in [31] with the prior 7r(-) to be the 
Lebesgue measure on the Borel sets on [1,2) and ?7[0,a] be the uniform dis- 
tribution on [0,a]. Denote = maxi<j<„yj, then the MLE /3„, and Bayes 
estimator /3„ (under squared error loss) of f3 are 

2 

/3n = X(y(n) < 1) H X(y(n) > 1), 

y{n) 

n+12"+2-l ^ ^ rf/^(")0"+id0 ^ 

71 + 2 2*^+^ — 1 ^ ' '■'^^9"-d6 

Schwartz [31] showed that the MLE is consistent while the Bayesian is not 
when (3 = 1 (although under some special prior, /3„, can be consistent). 

Now let X and y be independent, {xi,yi), . . . , {xn,yn) be an i.i.d. sample 
of {x,y) and the parameter he 6 = {a, (3). Then, neither the full Bayes esti- 
mator (d„,/3„) nor the full MLE {an,Pn) of 6 is consistent while the hybrid 
estimator (q,„,/3.„) is. 
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Example 6. As a last application, let's consider the problem men- 
tioned in the Introduction. The data follows a mixture model /(x|q;,/3) = 
J2j=i1j4'i^\^ji^j)j where is the density of A^(Q;,ri). Assume that 

we have good knowledge on cc = (ai, . . . ,ak), as summarized by the prior 
density TTj{aj) ~ N{cxjQ,fljQ), -K{a) = Y\j^iTr{aj), but not enough experi- 
ence for the parameter (3 = {Pi, . . . Pj = (7^, fij) {j = 1, . . . ,k). So we 
use a hybrid estimate with a Bayesian components on a and the MLE on /3. 
To estimate the parameters in a mixture model, often it is more convenient 
if we use a complete data model. For this, let lij be the membership indi- 
cator of Xj, that is, lij = 1 if Xj is from the jth subdistribution, and lij = 1 
and otherwise. Let li = {In, . . .,Iik), Yi = (xi,Ij) and y" = (yi, . . . ,yn). 
Treating as missing data, given the "complete data" y" and /3, 
the posterior on cx is 

n k 

7r{cx\y^,p) ^7r{cx)Y[Y[{j,cP{^i\cx„n,)Y^^ :=b{y^\0), 

1=1 j=i 

and the corresponding logarithm is 

n k 

l{cx, /3|y") =Y.Y. ^i(log7j + log<^(xi|aj, fij)) + log^(Q). 

i=ij=i 

Using the 0-1 loss on cx, its Bayesian solution is the posterior mode, so 
we are to maximize l{cx,P\y^) over (q;,/3). As typical, this leads to an 
EM algorithm. However, different from the common EM algorithm, here 
l{oL,P\y^) is not a proper log-likelihood due to the extra term log7r(Q!). 
If we define Q(6>'|6') = E[log6(y"|6'')|x", 6*], H{e'\e) = E[\ogg{y'^\e')\^'^ ,6], 
where £f(y"|0) = 6(y"|0)/a(x"|0) and a(x"|0) = 7r(Q!) flLi Then 
/(0|y") = Q{e'\e) - H{e'\e). it is seen that 5(y"|0) is just the conditional 
density of y" given x", thus Q{-\-) and H{-\-) here play the same roles as they 
do in the standard EM algorithm [10], and On = {6Ln,Pn) can be evaluated 
in closed form at each iteration, the details omitted here. 

Below we give simulation results to compare the performances of the 
MLE On, Bayes estimator On and the hybrid estimator On for this exam- 
ple. We take A; = 3, Xj's to be 1-dimensional and Clj = a'j. The parame- 
ter vector in the model is now = (7i,72,73,ai,a;2,a3,a"^,o"2,o"3). For the 
Bayes estimator, the prior is taken as 'k{0) = 7r(Q;)7r(7)7r(cr^). Since we do 
not have good knowledge on (7,0-^), except that < 7^ < 1, X)j=i7j = li 
and < cj2 < 3 (j = 1,2,3). We use a noninformative prior on them, that 
is, 7r(7) = 7r(7i)7r(72|7i) with 7r(7i) ~ C/(0,2/3), vr(72|7i) ~ f/(0,l - 71), 
73 = 1 — 71 — 72 and vr(cr^) ~ {/[0, 3]^. To distinguish the Bayesian estima- 
tor from the hybrid estimate, we use the squared error loss for the former, 
which has no closed form, and we use Markov chain Monte Carlo sampling 
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Table 1 

Simulation results for three estimators 



n 


00 


0.190 


0.540 


0.270 


— 0.850 


0.220 


1.350 


0.450 


0.200 


0.860 


1 nn 


a 


n 981 




n 


U. 1 oo 




1 SfT^ 




n 1 'HQ 


n 4Sfl 






l^U.lDo ) 


l^U. ioU ) 


yj. lo4 j 






t^u.uoy ) 






/fl 1 fi/l \ 




dn 


0.280 


0.386 


0.334 


-0.649 


0.204 


1.490 


0.539 


0.177 


0.722 






(0.158) 


(0.129) 


(0.146) 


(0.055) 


(0.089) 


(0.051) 


(0.188) 


(0.345) 


(0.093) 




On 


0.277 


0.427 


0.296 


-0.819 


0.210 


1.637 


0.348 


0.104 


0.575 






(0.168) 


(0.135) 


(0.169) 


(0.071) 


(0.149) 


(0.059) 


(0.370) 


(0.649) 


(0.131) 


300 


Qn 


0.201 


0.531 


0.267 


-0.905 


0.239 


1.371 


0.454 


0.169 


1.049 






(0.107) 


(0.069) 


(0.090) 


(0.027) 


(0.076) 


(0.022) 


(0.132) 


(0.204) 


(0.062) 




0n 


0.173 


0.563 


0.264 


-1.005 


0.232 


1.362 


0.440 


0.198 


1.181 






(0.115) 


(0.068) 


(0.089) 


(0.024) 


(0.072) 


(0.020) 


(0.141) 


(0.170) 


(0.050) 




On 


0.201 


0.531 


0.268 


-0.900 


0.237 


1.373 


0.447 


0.167 


1.021 






(0.108) 


(0.069) 


(0.090) 


(0.028) 


(0.077) 


(0.022) 


(0.140) 


(0.209) 


(0.065) 


1000 


On 


0.201 


0.533 


0.266 


-0.797 


0.227 


1.323 


0.420 


0.202 


0.824 






(0.055) 


(0.037) 


(0.049) 


(0.017) 


(0.037) 


(0.014) 


(0.095) 


(0.089) 


(0.042) 




On 


0.185 


0.556 


0.259 


-0.885 


0.215 


1.314 


0.405 


0.220 


0.883 






(0.059) 


(0.036) 


(0.050) 


(0.016) 


(0.038) 


(0.013) 


(0.104) 


(0.080) 


(0.037) 




On 


0.202 


0.531 


0.267 


-0.801 


0.227 


1.326 


0.416 


0.199 


0.811 






(0.055) 


(0.037) 


(0.049) 


(0.017) 


(0.038) 


(0.014) 


(0.097) 


(0.091) 


(0.043) 



to compute it. The results are given in Table 1, in which 6q is the true 
parameter value and the estimated standard deviations are in brackets. We 
see that when sample size is relatively small (n = 100), the hybrid estimator 
has better performance on a, which may be due to the good knowledge on 
it. The Bayes estimator behaves better only on 03. As sample size increases, 
the performances of the three estimator are close, as anticipated. 

Some pros and cons of each method. Here we discuss some advantages 
and disadvantages of the Bayesian and frequentist method in parametric 
inference. These known facts can help us in practice in the selection of the 
method to use. Our list is far from complete. 

Unbiasedness consideration. It is known [6, 29] that there are essentially 
no unbiased Bayes procedures. 

Small-sample advantage. When good prior knowledge about parameters 
is available, Bayesian estimate often has better small-sample advantage than 
the frequentist 's, due to the information in the prior. 

High-order behavior. Since the Bayes estimator, MLE and the hybrid 
estimator have the same first-order performance, if we want higher standards 
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to select among them, usually their second-order terms will be evaluated, 
such as in Example 4. 

Prior selection. In some cases we don't have sufficient knowledge for the 
prior on part of the parameters. Although one may use a noninformative 
prior [21] on these parameter components, so that a full Bayesian analy- 
sis can be performed, this often pays the price of small-sample bias and 
computational complexity. 

Feasibility. Sometimes it is difficult to implement a full Bayesian or a 
full frequentist's analysis on all the parameters of interest, but relatively 
easy for parts of the parameters by one of the methods. For example, in a 
multi-parametric model [14], some of the parameters are the change points, 
the model is nondifferentiable at these points and to compute the MLE on 
this part of the parameters is infeasible. 

Multidimensionality and nuisance parameters. In some models with high 
dimensional parameters or in the presence of nuisance parameters, either 
Bayes or frequentist's estimate may be difficult to compute. Various meth- 
ods ([7, 15, 28], etc.) have been studied for this problem. A proper hybrid 
formulation may be among the options. 

APPENDIX 

Regularity conditions. Throughout this paper we assume the densities 
are with respect to the Lebesque measure. In the following, conditions (Al)- 
(A3) are A2.1, A2.6 and A2.7 in [5]: 

(Al) 6 belongs to an open subset of R'^. 

(A2) Let l{^\e) be the log-likelihood, kssnme dl{^\e) / dG and dH{i(\e) / {dO dO') 

exist and are continuous in for almost all x. 
(A3) Ee(sup^g@ \\d'^l{^\r])/{dede')\\ : [[r/ - e\\ < e(0)) < oo for some e(0) > 

and all G 0. 

Let Pe be the data distribution given G ©, and /„(x"|0) = ^ X]r=i K'^il^)- 
Conditions (A4)-(A9) below are those of (l)-(6) in [34]. 

(A4) The metric space (0,d) is separable, where d{6,r]) = \\Pe — Prj\\- 
(A5) The functions (^„(-|0))0g0, n G A^, are separable and measurable. 
(A6) f{-\0), £ @, are lower semicontinuous, that is, limsup^^g^ f{-\dn) < 

fi-\9) (a.e.) ifd{0n,9)^0. 
(A7) For every 6,ri £ &, there is an open neighborhood Ue^ri of rj such that 

EgCmig'^U^ /„(x"|0')) > — oo for at least one n. 
(A8) For every o'e & and e > 0, U{r] G : E0l{^\r]) < Egl{^\0) + e) > 0, 

where n(-) is the distribution for 7r(-). 



20 



A. YUAN 



(A9) For every 6 £ & there is some ne such that (x" : / nr=i f i'^i\'n)^{d'n) < 
oo) = 1 if n > ng. 

Conditions (Bl)-(BIO) are multivariate versions of those of 1-10 in [17]. 
(Bl) For e^rj,J |/(x|0) - /(x|r?)| dx > 0. 

(B2) For some pi > and some compact set K G 0, sup^gj^ II ^ — 

r/||fi x/V/(x|0)/(x|r7)dx<oo. 
(B3) /(x|-) is continuous on 0*^, the closure of on and has k + 2 

{k>l) continuous derivatives on 0. 
(B4) (a) For some 6 > 0, and every compact K G 0, sup^gK -^0l|L(x| 

0)||3V(fc+l+fe) <oo. 

(b) For every compact K G 0, maX]^<|i|<;(,supggj^ £'0||Li(x|0)||*''+^ < 
oo. 

(c) For every compact K G 0, and for some ei(K) > 0, 

maX|i|=fc+iSUPegK,||0-r,||<ei{K)-E^0||Li(x|r/)||(''+^)/2 ^oo. 

(B5) (a) For some P2 >0, sup0g0(l + ||0||P2)-i||I(0)|| <oo. 

(b) 1(0) is positive definite for G 0. 
(B6) 7r(-) has k continuous derivatives on 0. 
(B7) For some ps > 0, sup0g0(l + \\0\\P^)-^Tr{e) < oo. 
(B8) W{-) > 0, is convex, that is, for any t G [0, 1] and ui and U2, W(tui + 

(1 - t)u2) < tW{ui) + (1 - t)W{u2). 
(B9) For some a= (ai, . . . , a^)' with aj > 1 to be an even integer {j = 

l,...,d) and 82 > 0, Wie) = Ejli 0'^' for ||6>|| < 82- 
(BIO) For some P4 > 0, sup0gj^d(l + \\e\\P^)-'^W{e) < 00. 

Note for the 1-dimensional case in [17], (B9) is for some real a > 1, £2 > 0, 
W{9) = \9\'^ for 1^1 < 82- Here we require a to be a componentwise even 
integer, otherwise the computation will be unnecessarily involved. 

Proof of Theorem 2.1. There is a compact ft' cft such that PqGQ,'. 
Define a prior on p as 7r(/3) = 1/L{n') for /3 G fi' and = for /3 G \ fl' , 
where L(-) is the Lebesque measure on R'^\ Then, define the prior tt{6) = 
■K{a)Tr{P), the decision d = (di,d2) for {a,P). Assign the loss W{d,6) = 
W{di,a) X V{d2, f3), where, for smah enough 6 > 0, V{d2, f3) = V{\\d2 - 
f3\\) =0 if ||d2 — (3\\ < 5 and 1 otherwise. Here, without confusion, we used 
W to denote both the loss on a and that on 6. Denote 7r(0|x") the posterior 
density for 6 under the above prior and i?„ = / VF(d, 0)7r(0|x") dO the pos- 
terior risk under the new prior and loss for 6. Then from (2.1), the hybrid 
estimate 0„ is the Bayes estimate of 6 under the above new prior and loss. 
To see this, the Bayes estimate under the new setting is 

argmin / W^(d, 0)7r(0|x") d0 
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arg min / / W{di,a)V{d2,P)iT{a,l3\x^)dadp 

(di,d2)j J 

arg min / / W{di,a)Tr{a,f3\x'^)dadp 

(di,d2)J J\\f3~d2\\>S 

argmin,max / W{di,a)f{-x."'\a.,P)Tr(a)da. = {an,/3n) 
fdi .a) J 



(di,/3) 

as the integration over f3 is minimized when d2 is the /3-marginal posterior 
mode, in this case the /3-marginal MLE, while the first integration is mini- 
mized by the corresponding marginal Bayes estimator. The 6 > above can 
be arbitrary and the result does not depend on it. 

Under the given conditions, by Lemma 2.1 in [5], the MLE of is consis- 
tent (a.s.); thus by Theorem 2.5 in [34], for any compact M 3 6q, 



P(^liminfn(M|x") = ij = 1, 

where n(-|x") is the posterior distribution under the new prior tt{9). 

Let Oq = (aO)/3o) be the true parametric value. By the given condi- 
tions on W{-), for e > 0, there is a 5 > such that T^(||q:o — Q^ll) < e 
as long as ||q;o — Q:|| < 5. Let M = (^o ±51) be the 5 neighborhood of 
Oq, then sup^gM ^(^Oi ^) < Since 7r(-|x") ^0 (a.s.) on M'^, and for 
large n it can be dominated by 7r(-|x'^) on M"^, so for large n we have 
/m= ^(^Oj ^)^(^|x") d6 < e. Since On is Bayesian under the new prior and 
loss, (2.1) is rewritten as 

0„ = argmm J VF((5, 6>)7r(0|x") d6>. 

So, for large n, we have the posterior risk 

Rn = J VF(0n,6')vr(6>|x")d6>< J VF(0o, 6')vr(6>|x") d6> 



P^(6>o,6')^(6>|x")d6'+ / VF(6'o,6>)7r(6'|x")d6> 

< / W{eo, 6>)7r(6>|x") de + e<e [ 7r(6'|x") dO + e = 2e. 
Jm J 

Since e > is arbitrary, we have Rn (a.s.). 

Suppose 6n is not consistent (a.s.) to Oq, or limsup||0„ — Oo\\ > e (a.s.) 
for some e > 0. Then there is a sub-sequence {n^} such that either (a) 
limfc ||q;„j^ -Qoll > e/2 and lim^ ||/3„^, -/3oll > '^/2; or (b) lim^ ||q;„^, -aoll > ^ 
but limfc -/3o|| ^ 0; or (c) lim^ ||q!„^ -qqII ^ but lim^ ||/3„^ -/3o|| >e. 
For case (a), let M = {0 -.{{O - Oo\\ < e/2}, then M C Mi := {0 : ||q - q;„J| > 
e/2; 11/3 - /3„J| > e/2}. Note for = (a,/3) G Mi, W{\\a - ||) > W{e/2), 
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and n(M|x"''=) — > 1 (a.s.) by the previous result. Also, by our choice of the 
prior on (3, we have for some < c < 1, 

/ y(/3„^,,/3)7r(0|x"^)a!0> / y(/3„^,/3)7r(0|x«^) > cn(M|x"^). 

So we have, for all n^, 

Rn, > [ W(6>„,,6>)7r(6>|x"'=)d6> 

>W{e/2) I V{f3^^,l3)'K{e\^^^'^)de 

>W{e/2) I y(/3„^,/3)7r(0|x"^)d0 

JM 

> cVF(e/2)n(M|x"*) ^ cW{e/2) > 0, 

which is a contradiction to the fact that i?„, (a.s.), and so (a) cannot be 
true. 

For case (b), let ni^„^(-|x"'=) be the ct-marginal posterior distribution with 
j3 evaluated at /3„^ . Similarly as before, let M = {cxq it 51). Since /3„^ — > /3o, 
we have P(limfcinf ni^.„^. (Mjx"*^) = 1) = 1, and as before, the posterior risk 

:= j W^(a„,,Q:)7r(Q:,/3„Jx"'=)(iQ: 
< / W{oLQ,a.)'n:icx,f!i„\yi^'')da 

JM 

+ / W{ao,a)TT{a,p,J:ie"')da. 

JM= 

<e J ir{a,P^Jx"''')da + e <2e, 

i.e. Rn^. — > 0. On the other hand, for the fixed e > 0, let M = {a : \\a — cxqW < 
e/2}, then M C Mi := {q: ||q! - q;„J| > e/2}. We have 

Rn,> f T^(a„,,a)7r(a,/3„Jx"'=)da>T^(e/2)ni,„jM|x"'')^VF(e/2)>0, 

JMi 

a contradiction. 

By similar argument on the /3-margin, it is seen that case (c) cannot be 
true. □ 

Proof of Theorem 2.2. We check the main steps, for the vector ver- 
sion, of the proof of Theorem 1 in [17]. Lemmas 1-6 of Gusev [17] there can 

be checked similarly. Let 0„ = y/n{9n — Oq). By definition of On, we have 

(A.l) So(0n)+n-l/Vo(^n)=O. 



Also, 
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r=0 |i|=r ^' 

r=0 \|i|=r ■ |i|=r+l ' / 



and 



n-^/Vo(^n)^E--'^/' E ^Pi- 

r=l |i|=r-l 

In the above, we used the relationship Si = A; + n^/^E;, and Eq = 0. 

Note I = — (Ei : |i| = 1)', the Fisher information matrix, so X]|i|=i((^n.)')-Ei = 
—lOn- Now, we rewrite (A.l) as 

Ao-I^l + En-^/^f E ^Pi+E^Ai+ E ^E,' 

r=l \|i|=r--l ^' |i|=r ^' |i|=r+l ^' J 

(A.2) 

~0. 

Consider, with t = 0, the term Es+i=r E|i|=t+i Ei E|i|=s E(o,i,i) ffi ^XT • 
Note i can only take one of the vectors e^'s. First we take i = ei, since 
|1| = r it is easy to check that the only nonempty integer vector sets satisfy 
the definition of E(o,i,i) ~ ^^^i}, and the only i„ in this set is {i^ = ei}. 
Similarly, for i = e2, the only nonempty integer vector sets satisfy the defi- 
nition of E(o,i,i) ~ i"^2}, and the only i„ in this set is {i^ = 62}, . . . , so 
we have, with t = 0, 

E E EiE E n^=EEiE E fl^'"^ 



s+t=r\i\=t+l \l\=s{0,s,\,i)v=0 |i|=l |l|=r (0,s,I,i)iJ=0 



= EEiV = -IH 

|i|=i 

Thus, the expression for in Theorem 2.2 is rewritten as 



r- 



(A.3) 



E f E aE E n^ + EAiE E 

s+t=r- \|i|=t-l |l|=s(0,s,l,i)?^=0 ^' \i\=t |l|=s(0,s,l,i)t)=0 

+ E EiE E n^)~ 

|i|=t+l |l|=s(0,s,l,i)«=0 '"■ / 
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Set = YHZ^ H^n-^/^ := {O'l,,, . . . , C)', so {{elf) = WUiK^P ■ 

For integers r >a and integers m, Z > 0, Ii{a,r,'m,l) denote the set of 
nonnegative integers (ii, . . . , i^), 

{r ^ 
=m, ^i„ = n. 
v=a v=a ) 

Write Hi = . . . , and i = (n, . . . , id)'. Note C,j = Er=d ?^"'/'^r,i, 

and 

r-O 7i(0,r,r,ij) f=0 

It can be checked that 

n E ri^^:.A.! = E E ri(Hjr)/i'^!> 

i=l /i(0,r,r,ij) v=0 |l|=r (0,r,l,i) v=Q 

thus 

r=0 j=l/j(o,r,r,ij)t'=0 

= i!gn-^/^E E n(Hi")/iJ. 

r=0 |l|=r(0,r,l,i)tJ=0 

Note (A. 2) still holds with 0'^ replaced by 0'^, and using (A. 3) we get 
Ao-ieUT.n~M E ^Pi+E^^i+ E ^E,^ 

r=l \|i|=r-l ^' |i|=r ^' |i|=r+l ^" I 

r=0 \|i|=r-l ■ ji|=r ' |i|=r+l ' / 

E PiE E n^T^ 

r=Os=0 \|i|=r-l |l|=s(0,s,l,i)t)=0 ^' 

+ EAiE E 

|i|=r |l|=s(0,s,l,i)«=0 ^' 

(A.4) + E EiE E n^) 

|i|=r+l |l|=s(0,s,l,i)i)=0 / 
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r=0 s+t=r \|i|=t_l |l| = <i(0,s,l,i)f=0 



1„! 



+ EAiE E 

ji|=t |l|=s(0,s,l,i)f=0 

+ E EiE E ri^)-o. 

Now (A. 2) minus the left-hand side of (A. 4), similarly as in [17], we get 

^ \(e'^ - K) + o{n^i^)\(e'^ - K) = (i + o{n-^i^))\{h'^ - el). 

Thus, 0^ ~ and Theorem 2.2 is proved. □ 
Recall 0^ = ^Jn(Qn — So), and define 

We first extend a result in [17] to the multivariate case. 

Lemma 1. Under the conditions of Theorem 2.2, we have 

^n(y„j V ^ / \ r=l |i|=2 / 

where 

r v+2 p'^'i 



^^i . = En E n 



/2(r,i)''=l/i(2,t;+2,k„,i,) |j|=2 J'^J''' 

in the above, the summations are for i^ G lQ{l,r,r), k^, G l2{r,i), and for 
each fixed V, Uj € Ii{2,v + 2,'ky,iy), with the notation /o(l, r,r), l2(r, i) and 
/i(m, r, k^, i„) given at the end of the proof, and 

Fi,r= E E ^-jE E ri(Hi")/i^! 

t+s=r,t>lj>i,\j\=t |l|=s(0,s,l,j-i)i'=0 

+ E E ^jE E limn^^- 

t+s=r+l,t>2 j>i,|j|=t |l|=s (0,s,l,j-i) ?^=0 

+ E E ^jE E n(Hi^)/i^!- 

t+s=r+2,t>3 j>i,|j|=t |l|=s (0,s,l j-i) ^-=0 
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Proof. As in the proof of Theorem 2 in [17], we have 

r=0 |j|=r+l J' 

r=l |j|=r J' 

-2^™ 2^ '^jZ^ i'(i-i)' 

r=0 |j|=r+l i<j ^ 



r=l |j|=r i<j ^"^j 



and 



r=0 jj|=r+l J- r=l |jl=r J' 

These give 



Zn{On) r=Q \\\=r+l i<j,|i|=l '^^ 



\i\=r+l i<j,|i 



^ fi-i)! 

'^=1 |j|=r i<j,|i|=i ^ 

+ E--/^ E E ^^1^ 

r=l |j|=r-+l i<j,|i|>2 '^-^ 

+ E^"'^/'E^'j E ^^'^^^^"^'"^ 



r-=2 |j|=r i<j,|i|>2 



Recall for lil = 1 we have 



O = 5i(0„) + n"^/2^i(0„) 

k 



HYBRID INFERENCE 



27 



and 



r=l |j|=r-l J' 



SO the first two summations in the right-hand side of (A. 5) together is 

|i|=l \r=0 |j|=-r r=l |j|=r-l ' 

Using 5j = (5j + \/n<£j, (A. 5) is now 



log- 



Zn{(^n) r=2 \j\=r i<j,|i|>2 



(A.6) 



^=1 UN^'+l i<j,|i|>2 ^"^ ^ 

r=0 |j|=r+2 i<j,|i|>2 ' 



r=l \j\=r i<j,|i|>2 ' 

r=l |j|=r-+l i<j,|i|>2 '^-^ '' 

'•=1 |j|=r-+2 i<j,|i|>2 ''^-^ |j|=2 

In the above we used the fact that for r = 1, Z)|j|=r Ei<j,li|>2 • • • = 0, as it 
is a summation over empty set, thus Z)r=2 "-"'"^^ S|j|=r ^?j Si<j,|i|>2 ' ' ' c:^^ 
be rewritten as X]r=i S|j|=r I]i<j,|i|>2 ' " " • Note 

Also, as in the proof of Theorem 2.2, 

A:— 1 r 

((<,)M~((^nr')~(j-i)!E^"'^'E E n(H;")A-'- 

r=0 |l|=r(0,r,lj-i)t)=0 
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Plugging in the above into (A. 6) and rearranging terms, we get 



~ -e'w/2 

k-l 



,1 11 ; 1 

r=l \t+s=r,t>l |i|=2 ■ j>i,|j|=t |l|=s {0,s,l,j-i) »'=0 

+ E Ef^ E ^jE e 

f+s=r+l,t>2|i|=2 ■ j>i,|j|=t |l|=s{0,s,l,j-i)f=0 

+ E E^ E ^jE e 

f+s=r+2,t>3 |i|=2 ■ j>i,|j|=t |l|=s (0,s,l,j-i) t)=0 ^" . 



-e'w/2 

k~l r+2 



+ E--^/^EY ^ E ^jE E 

^=1 |i|=2 ■ Ws=r-,t>lj>i,|j|=t |l|=s{0,s,lj-i)t^=0 



+ E E ^jE E 

t+s=r+l,t>2j>i,|j|=t |l|=s(0,s,l,j-i)t'=0 

+ E E ^jE E n^) 

t+s=r+2,t>3j>i,|j|=t |l|=s(0,s,lj-i)t'=0 ^' / 

fc-1 r+2 

= -Vi0 + E--^/^E^^i.- 

r=l |i|=2 

In the above we used the fact that, for |i| > r + 1, the summation 
Et+s=r Z]j>i,|j|=tZ]|i|=s is over an empty set, thus the first term inside the 
bracket above originahy is J2r=i^~^^'^^t+s=rJ2\i\=2{^^)/'^^-"' ^'^d 
rewritten as Z]r=i '^"''''^ X]|i|'^2^^')/^' " " ' • ^'-'^ same reason, the second 
term inside the bracket above originally is J2r=i Z]t+s=r+i Z]|i|=2(^')/i' " ' ' 
and can be rewritten as Y^r=i "'^"^ ^'^Y^\^=2^^^) 1'^^- ' ' '• Now as in [17] we have 

^exp(--010jexpn n / )^ F;,, , 

^n(,t/„j ^ ^ \r=l |i|=2 / 
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/fc-1 r+2 /ai\ \ k-1 r / v+2 ,pi\ 

\r=l |i|=2 ■ / r=l Ioil,r,r)v=l\\i\=2 ' 



1 



the second summation on the right-hand side above is for {ii,...,ir) € 
/o(l, r,r). Also, 



E n E^^^u 



= E n E i^"-') E n^=E(^')A.; 

/o(l,r-,r)?^=l |k„|=2j„ 7l(2,D+2,k„,i„) |j|=2 J'^''''' |i|=2 

in the left-hand side above the summations are for G lQ{l,r,r), 

and for each given v and k^,, Uj G /i(2, v + 2, k„, i^,). Now we have 

/fe-l r+2 \ fc-1 3r 

exp 5: n-/2 E ^^i.O ' 1 + E E 

\r-=l |i|=2 ^" / ■r=l |i|=2 

In the definition of Lir, Ioi^,r,r) = 1J;>q r, r, where r,r,/) is 
defined in the proof of Theorem 2.2. For given G /o(l, r,r), and 

integer d- vector i, l2{f', i) is the collection of integer d- vectors ki, . . . , k^, 

l2(r,i) = |(ki,...,kr):^|k„| =i, < \ky\ < {v + 2)i^ , i; = l,...,r|. 

Given integer d-vector k, integers i and m<r, /i(m,r, k, i) is the collection 
of integers uj indexed by a integer d- vector j , 



□ 



h{m, r,'k,i) = <uy. ^ juj = k, ^ = i L 

I |jl=m |j|="i 

Proof of Theorem 2.3. By Theorem 2.2, we only need to prove 

fc— 1 

(A.7) V^iOn - ^n) = E ""'/'Qr + Op(n-'=/2)_ 

r=l 

Denote W^^\0) = {dW{e)/dei, . . . ,dW{0)/d9dy , d„ = V^(0„, - 0„,) and 
0„ = y/n{6n — Oq)- We only need to point out the main modifications to the 
proof of Theorem 4 in [17]. In place (4.5) of [17] there we have 



n 
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corresponding to (4.7) of [17] there we have 

||wW(u)|| < (j2{W{u + ei) + W{u) + W{u-ei)) 



\ 1/2 
2 1 



a.s. 



\i=l 



By Condition 9, for ||(d„ — 6)/^/n\\ < 62, we can replace W(^)((d„ - 0)/^^) 
by the vector a((d„ — 6) / ^/n)^"^ , and (4.10) of [17] there is replaced by, for 
6>0, 



[ {dn - e) 



e|l<"V2 Z„(6>„) 

As in [17], d„ ~ 0. Define iVo,o = 1, thus 1 = n-^/'^{e°)Nofi, let |I| be the 
determinant of I, by Lemma 1 the above is 



k-l 3r 

k 



0^^n-/2 Y: (2vr)'^/2|I|-V2Ar.^ 

r=0 |i|=2{lAr) 

Also, as in [17], for each i and A; > 0, 

J||6»|l>n*/2 (27r)'^/2 V 2 / 
Define the R'^ to R'^ function *i(-) = (V'i,i(-)> • • • , V'i,d(-))' as 

*i(u) = I e^-^ae + u)') exp (-^(0 + u)'i(0 + u)) de, 

and *P(-) = (Vg(-), • • ■,i^^li-)y with Vg(u) = al-ilVi,fc(u)/auJ (A: = 1, . . . , d). 

k 

On the right-hand side of the "0 ~" relationship above, leave out the 
constant (27r)'^/^|I|"^/^ and multiply the nonsingular matrix ({cr(a)}I)~^, 

k 

the "0 ~" relationship remains. The choice of this matrix will be clear when 
we prove the expression for <r < k — 1) later. We will show below 

that *p^(0) = 0, for |i+j| even, so the previous relationship is rewritten as 

k—l 3r 

0^5]n"^/2 iVi,.({'T(a)}I)-i*i(d„) 

r=0 |i|=2{lAr) 

(A.S) ^'^'n-/^ Y N,r'~Y ^^i{cr{^)}ir'^f\0) 



r=0 |i|=2(lAr) |j|=0 



J 
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^n-/2 J2 (4) E fiVi,r({'T(a)}I)-i*SjV) 



r=0 
fc-l 

r=0 
fc-1 



|j|=0 
fc— 1— r 



|i|=2{lAr) ■ 
3r+|j| 



1 



E i<) E TyiVs-j,.({^(a)}I)-^*i^ij(0) 



|j|=o 

fc-l-r 



|s|=2(lAr)+|j|' 



E--^/^ E K) E 



r=0 

E 



n 



|j|=o 



|s|G(2(lAr) + |j|,3r+lj|> 



iiV^_j,,({^(a)}I)-i*2j(0) 



r=0 |j|=0 

where {cr(a)} is the d x d diagonal matrix with rth element be the a^th 
moment for Or [with 6 ~ A^(0,I~^)]. Denote I = (ist), = using the 
joint moment formula for multivariate normal distribution, we have 

V'i.fcio) = / er\e^)e.p(^-\e'ie^ do 

0, if |i| even, 

<T((a — l)eyfc + i), otherwise, 

where cr(a) = E{6^) is the joint ath moment of ~ A^(0,I^^). Note 



2duk 
92(61 + u)'I(6l + u) 



2duk dur 



u=0 



u=0 



r=l 

^k^^r — ^kr 



and 9l'l[(0 + u)'I(0 + u)]/au'|u=o = 0, for |i| > 2, we have 



(2^)"'/2 



i+(a-l)efc 



where -fi+'*(a-i)efc multivariate polynomial in 6 given by the rela- 

tionship [note el'^-^ = (6t('^-i)^'=)] 

d\i\ [ef-\{e + u)') exp(-l/2(6> + u)l(6> + u))] 



9uJ 



u=0 



and ^'i+(a_i)e,('^) = ^(-Pi?(a-i)efc(^)) ^i^h e ~ Af(0,I-^) is the correspond- 



ing vector polynomial in the moments fx^j's. Especially, ^oi(^) ~ 
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1,. . . ,d), for |j| even, or ^'q^^(O) = 0, for |j| even, and it can be shown that 
= 0) if |i + j| is even. We get 

3r 



1 



|i|=2(lAr) J- 

and Mj J. = for |i + j| even. Note the recursive relationship for the M; ^.'s 
in Theorem 2.3 can be rewritten as 

r m m 

(A.9) E E E n (Ql")A-! = 0' l<r<k-l. 

m=0|i|=0 |l|=m(l,m,l,i)i;=l 

To see this, let < = Et!ri-^/^Qj-; as in the proof of Theorem 2.2, we have 



(«r>'i!E-"^/'E E EiQir)nvi 



■ (l,r,l,i) -0=1 



Denote PSi_i(^) = (P/Vi)e, (^), • • • , ^'/Vi)e.(^))'- ^^ke i = in 
above, we have J2\\\=oJ2{i,o,i,o)Ilv=i{Qv)/'^v^- = 1 symbolically. Note Mo,o = 

0, and recah we defined iVo,o = 1, so Me,,o = iVo,o({o-(a)I})-^Pi''_:l(o-) = 
— ({cr(a)}I)~^ X ({cr(a)}I)r, where ({cr(a)}I)r is the rth column of {cr(a)}I; 

k 

this is the reason we multiply it in the previous ~ relationship, and we 
see that {a(a)}I = -(P£}(a), . . .,P^^l{a)). Note E.ti({'T(a)}I),(Qr ) = 
({o-(a)}I)Qi. So, for r = 1, (A.9) is 

1 1 
= Mo,i + ^ Mi,o E E (Qi)=Mo,i+ 5]Mi,o(Qi) 

|i|=0 |1|=1 (l,l,l,i) |il=0 

d 

= Mo,i + Mo,o + E Me.,o(Qr ) = Mo,i - Qi; 

r=l 

this gives the equivalence of (A.9) and the expression for Qi given in The- 
orem 2.3. 

For r > 1, since Mj o = for |j| even, (A.9) is rewritten as 

1 — 1 rn m 

0=^ 5:i!Mi„ Yi E nw)/i-! 

m=0|i|=0 |l|=m(l,m,l,i) f=l 

+ J2 i!Mi,oE E n(Ql")/i.! + EMi,oE E IliQi')/^^'- 

|i|G{3,r> |l|=r(l,r,l,i)i'=l |i|=l |l|=r (l,r,l,i) i;=l 



the 
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As before, I]|i|=oZ](i,oi,o) I\v=i{Q^o) /^v^- = ^''^^^ term above is Mo,r + 

E;n=lE^l|=li!Mi,,_^E|l|=„^E(l,„^,l,i)^™=l(Ql")/i^-!• For the second term, 
since r > 1, 3 < |i| < |1|, the set (l,r, l,i) is empty if ir ^ 0, thus the factor 
nv=i(Qi)")/i«' i'^ the second term is in fact lYvZi{Ql")/ivl- For the third 
term, ie {ei,...,erf} and |1| = r, if i = efc, the set (l,r,l,efc) is nonempty 
only if 1 = refc, and in this case (1, r, 1, i) = {(ii, . . . , i^.i, i^) = (0, . . . , 0, 6^)}. 
Similarly as before, the third term above is J2k=i^ek oiQr'') = 
-({cr(a)}I)-n{cr(a)}I),(Q^'=) = -Q,. Now we get 

r— 1 m m 
m=l |i|=l |l|=m {l,m,l,i) 1^=1 

+ i!Mi,oE E n(Qi-)/i-'-Q- 

|i|G(3,r) |l|=r(l,r-l,l,i)?;=l 

and the equivalence of (A. 9) and the recursive relationship for the Mi^^'s in 
Theorem 2.3 is proved. 

We now show that d'„ also satisfies (A. 8). In fact, by (A. 9) we have 

k—l k—l—r 

^n-'/2 Y (K)')Mi,. 
r=0 |i|=0 

k-1 k-l-r / k-1 s \ 

!tYn~^/' Y M, J i! 5:^-/^5: Y m^v^A 

r=0 |i|=0 \ s=|i| |l|=s(l,s,l,i)f=l / 



r=0 s=0 |i|=s |l|=s(l,s,l,i)i;=l 



k-1 (fc-l-r)Vs 
k -t/2 



-Y^-"' E E i'M,,^: Y n(Qi")/i^!- 

r=0 r+s=t |i|=o |l|=s(l,s,l,i)D=l 

Note t <k — l and r + s = i implies — 1 — r > s, so (A; — 1 — r) V s = s, and 
the above is 

E--*/'EEi!MM_.E E n(Qjr)/iJ = o. 

r=0 s=0|i|=0 |I|=s(l,s,l,i)D=l 

The remaining proofs are similar to those in [17] and are omitted. 

Lastly, for general loss function W{-) satisfies Conditions (B8) and (BIO) 
with various derivatives and ||WW(n-i/2(d„ - 6>))|| < C7||6> + 0„p for some 
< C < oo, and 7 > [17]. Then similarly as before, we have 

*i(u) = j W«(0)((0 + uy)-Hl^exp(^-i(0 + u)'l(0 + u))d0, 
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' ^ ' J 5uJ 



WW(6')((6> + u)') ^1 \^/r, exp 



Denote = (Wi, 

|I|l/2 



WdY, assume VFfc(l) /O {k 



-(0 + u)'l(0 + u 



dO. 



u=0 

d), and define 



d 



(27r)rf/2 7 du 



exp 



--(0 + u)'i(0 + u; 



u=0 



[The previous situation is a special case with a^j = Eg{6'^'' ^ X)r=i ^rj^r) = 
irkEeiBl") and ~ 7V(0,I-i), or S = {cr(a)}I.] Then, we have Go = Hq 
Gr = H.^ + Qr {l<r<k- 1), with 

r—l m m 
m=0|i|=0 |l|=m{l,m,l,i)i;=l 



|i|=2 |l|=r(l,r-l,l,i)i;=l 
3r+|j| 

|i|=2(lAr)+|jr' 

In the above, we assume S to be nonsingular. Here we no longer have 

^F^(O) — fo'^ |i + j| even, thus the above formula for Mj f. has more terms 
than stated in Theorem 2.3, and each term has more complicated form. □ 



i,,*i'?j(0) 



Proof of Theorem 2.4. For = (q:.„,;9„) given in (2.1), although 
it can be formulated as a joint Bayesian estimator with an additional 0-1 
error loss and a constant prior on /3, Theorem 2.3 cannot be applied to get 
its expansion, as Condition (BIO) there excludes such loss. 

We first outline the idea of the proof. Denote H° = (h^]^,h^2)') G,. = 
(Sri) gr2)' is the component notations of the rth order term in the expansion 
of \/n(Q!^ — a'oi/fl'n — /S'o)' and ^/n{c^'^ — ol'q,p!^ — (Sq)' . In Theorem 2.2, H° 
is a function of Hq, . . . , 'H.°_i- In terms of the components, h^i is a function 
of hoi, . . . , h,._i^i and ho2, . . . , hr_i,2, with some evaluations at 6q, similarly 
for hr2- We denote these functions as, for r = 1, . . . ,k — 1, 



hri — 'Hrl(hoi,ho2, . . . , 
hr2 = '^r2(hoi,ho2, • • • , h^-l,! , h,._i^2 1 ^o) • 

By Theorem 2.3, the components gri and gr-2 are also functions of goi, . . . , gr-i,i 
and go2, • • • , gr-1,2- We denote them as, for r = 1, . . . , /c — 1, 

grl = ^rl (gOl ) g02 , • ■ • , gr-1,1 , gr-1,2 1 ^o) > 
gr2 = ^r2(g01ig02, • • • , gr-1,1 , gr-1,2 1 ^o) • 



-1,1 



hr 



-1,2 
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The expansion in Theorem 2.3 can be obtained by another way. Fix 
in the posterior and expand -^/n(Q;„ — olq) to get 

fc-i 

r=0 

where gri(') is the (ii -dimensional version of Gr{-) and with Oq replaced by 
(ctoj/^n)- Now expand i/n(/3„ — /3o) in the gri(-)'s above; this gives the same 
expansion as that in Theorem 2.3, so we must have 

fc-i fe-i 

n-'/'^grliPn) = J2 ^~"^^^rl(g01, g02, • • • , grl, gr2) + Op{n~^/^). 
r=0 r=0 

On the other hand, for the hybrid estimator {an,(3j^) we can expand the 
two components simultaneously or componentwise, and the two expansions 
are the SQiiiiG. Wg fix /3^, first expand, y/ni^dcfi — 

ao) and we have 

r=0 

Then, expand -v/n(/3„ — (3q) in the gr.i(-)'s. Comparing the procedures for 
^/n{a.n — cto)!/^ and ^/n(Q.n — cto)]?, , and note the statuses of and (3 
in the gri(')'s are the same, except that the former expands in terms of 
goi, g02, • • • , grl, gr2 and the latter in terms of goi, ho2, • • • , gri, hr.2. So we 
get 

fc-i fc-i 

Y n-'-Z^grSn) = E ^"'/'^rllgOl, h02, ■ • ■ , g.l, h.s) + 0^(71-^/^) 
r=0 r=0 

fc-1 

= E ?^~'^^^[Wrl(g01, ho2, . . . , grl, h^2) + tr] + Opin'''/^). 
r=0 

The last step above is by the recursive relationship in Theorem 2.3. 
In the same way, for the MLE (q;„,/3„), we have 

fc— 1 

r=0 

where hr2(-) is the d2-dimensional version of and with 9q replaced by 
(q;„,/3o)- As before, we have 

fe-i fc-i 

E n-'"/2h,2(«n) = E n-^/2Wr2(hoi, ho2, . • . , KlM + Op{n~''/^). 
r=0 r=0 
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Similarly, for the hybrid estimator {a.n,Pn)y we have 



fc-i 



r=0 
fc-1 

= E n-'"/27^.2(gol, ho2, . . . , griM + Opln-'^/^). 



r=0 



These give 



f Wri(go,ho, . . . ,gr-i,hr_i|0o)A , /t. 



+ 



hr/ V^r2(gO,ho,...,gr-l,hr-l|0o)/ VO/' 

which is the recursive formula in the theorem. 

Below, we go into some details of the above sketch. We need the follow- 
ing notation. Let I = (Jij)i<ij<2 be the partition of the Fisher information 
matrix, where In corresponding to the block for a and I22 for /3. For non- 
negative integer di-vector i and nonnegative integer (i2-vector j, define 

p^{cx) := {p\ (a),..., p\ (a))' = log7r(Q;), . . . , ^— - log 7r(a) j , 



l^{x\cx,f3) = - — log/(x|Q;,/3),...,- log/(x|Q;,/3) 

\oai dudi ) 

■■= iL\ix\a,(3),...,L\^{x\cx,(3)y, 

l?{x\cx,p) = log/(x|Q;,/3),...,^^log/(x|Q!,/3)j 
:=(L?(x|a,/3),...,L2^(x|a,/3))', 

-j^ n 1 " 

=7^EL{i;j)(^^-|"'/3), S;i^j)(/3) = -=EL{i;j)(^.l«0,/3), 

1 " 

=^E(L(i;j)(^j-|",/3)-^(c.,/3)L;i,j)(^|a,/3)), 
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1 " 

A;.j)(/3) =^^(L;i.)(X,|ao,/3)-i?(c.o,/3)L(i;j)(^l"o,/3)), 

S(i;j) = S;i.j)(ao,/3o), ^{i;j) = A;i.j)(ao,/3o), 

1 " 

S(i;j)("'/3) = 7=EL(i;j)(^.-|"'/3)> 

1 " 

and define A^j.j)(Q;, /3), A^j.j-)(Q:), E^j.jj(Q;) and S^j.j^ accordingly. 

We first give the expression for hg; the expression for go will be out- 
lined later. Fix q;„, note here Pq does not depend on (3, E^q.q^ = and 

^(0;0)(""') ^ ^- -'^^^ ^'n = V^0n " /^o); ^nd = ^/n{c^n - Qq) (not to be 
confused with the transpose). As in the proof of Theorem 2.2, we have 

= S?o;0) («n>/3n) = Sjo,0) ("n, /^Q + n-'/^f3'J 
r=0 |j|=r J- 



~ n 



l/2E2o^O)("n) 



r=0 \|j|=r J' |j|=»-+l ^' ' 

r=0 \|i|=r+l 



s+t=r|j|=s J' |i|=t 
- / 



«+*='" |j|=s+l |i|=t 

2 / ' ' 

= '^(0;0) - l21«n " l22/3„ 

(A.IO) +E--^/^f E ^4;0) 

r=l \|i|=r+l 
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+ 2^ ^{i;j) 

s+t=r|j|=s J- |i|=j 

+ ^] — ^1 — ^(i;j) ' 

s+t=r- |j|=s+l J' \i\=t ■ / 

or l2iQ!^ + l22/9n = ■^(0 0) + Op(n~^/^). Similarly, fix ^„ and expand Qn, 
we will have InQn + Ii2/^„ = qj + Op(n"^/^). So we get V^I(^"I^°) = 
(^2°'°') + Op(?^~^^^)) and note (AL.qn, AL.qx)' = Aq; this gives the expres- 

■^(0,0) 

sion for go and ho. 

To prove the expansions for the gj,'s and hj.'s, we use induction. We only 
need to prove those for the g','s and h^'s, where (^r) = I(h'^)- Now we prove 

s+t=r-l\i\=s+l |l|=t (0,t,l,i)»'=0 

+ E E EEA?i,)E E n(gi")/i^! 

a+fe+c=r-l s+t=c |jl=s |il=t |l|=a {0,a,l,i) v=0 

xE E ri(hi")/ij 

|l|=fe(0,fe,lj)*'=0 

+ E E E EE?i,)E E n(gj.")/i^! 

a+fe+c=r-l s+t=c |j|=s+l |i|=t |l|=a {0,a,l,i) t'=0 

xE E n(Kr)/iJ, 

|l|=6(0,M,j)'^=0 

g(, = h', + (1 < r < /c — 1), and the q^.'s are outlined later. 
Since in the following t = r, we get 

E E 4;o)E E ri(gjr)/ij=EE?i;o)(g^) = -i2ig.; 

s+t=r |i|=s+l |l|=t (0,t,l,i)t'=0 |i|=l 

similarly, in the following, we take b = r (note a = r will result in summations 
over empty sets) and we get 

E E E E4;j)E E n(gi^)A^!E E IKk^)/!^' 

a+b+c=rs+t=c\j\=s+l\i\=t |l|=a (0,a,l,i) «=0 |l|=fe (0,6,l,j) ^'=0 

= — l22hr, 
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and the previous expression for = l2igr + l22hr (l<r<A; — l)is rewritten 
as 

E E E?,o)E E ri(gi")/i^' 

s+t=r\i\=s+l |l|=i(0,t,l,i)f=0 

+ E E EE^(i;j)E E n(gi")A^' 

a+b+c=r s+t=c |j|=s \i\=t \\\=a (0,a,l,i) v=0 

(A.ii) e i[iK)nvi 

\l\=b(0,b,\j)v=0 

+ E E E EE?i;j)E E n(gi")/i''! 

a+fe+c=rs+i=c|j|=s+l |i|=t |l|=a (0,a,l,i) t'=0 

xE E n(K^)/iJ = 0. 
|i|=b(OAiJ)''=o 

Let a'^ = Er=d "-"''^^gr and = T,r=o^~''^^K, then 

((a:n~i!E-^''/'E E fl(g;")/i^!, 

r=0 |l|=r(0,r,l,i)t)=0 

((^JJ) 'j!E-"'"^'E E f[iK)nj- 

r=0 |l|=r(0,r,lj)?^=0 

By (A. 10) and (A. 11), we have 

2 ^ " " 



(0;0) 

+ E--^/^f E ^Ej,o) 



r=l \|i[=r+l 

^ ((to v-OM) .2 
+ ^(i;j) 

s+t=r\i\=s ■>■ \i\=t 

+ 2^ ^^Z.^^%j) 

s+t=r |j| = <i+l J' |i|=t ■ / 
r=0 \|i|=r+l ■ s+i=r |jl=s J' |i|=t 
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+ 2^ 2^ :i 2^ i, ^(i;j) 

^+*=''|j|=s+l |i|=i ■ ' 



-E--^/^f E E 4;o)E E ri(gi")/i^' 

r=0 \s+t=r- |i|=s+i |l|=t (0,i,l,i)f=0 



+ E E EEA^j)E E n(gi")/i^! 

a+6+c=r s+t=c |j|=s |i|=t |l|=a (0,a,l,i) v=Q 



xE E n(hi")/iJ 



|l|=b (0,6,1 j)f=0 

+ E E E EE^j)E E n(gi")/i^! 

a+6+c=r s+t=c |j|=s+l \\\=t \\\=a (0,a,l,i) v=Q 



E E Vimr^.^ 

ll=6('0.fe.l.i')f=0 / 



X 

=6{0,fe,l,j) 

= 0. 

Thus, (A. 10) minus (A. 12) will give the expression for h(,. 

Now, we outline the expressions for the g^'s. We need to modify the result 
in Lemma 1. For fixed define 

v ^ a\ /A /(2;^|ao + Q:?^~^/^/9n) V("0 + "™~^^^) 
Z„(o;,/3„) = \Y -— ^ — — . 

Define '?(i.j)(")) '^{\ j)(") ^"^^ Q\ accordingly. We have 
4;0)(^n) ~ E E 4;j)((^n)^)/j' 

r=0 |j|=r 

= E--^/^E E E4;j)E E ri((h.)'^)/i.! 

r=0 |j|=r s+t=r |j|=s |l|=t (0,l,j) ^^=0 

and 

k—\ 

4;0)(^n) ' E^"''^'E^(i;j)((^n)^)/j! 

r=0 |j|=r 

= E--^/^E E E4;j)E E ri((h.)'")/ij. 

r=0 |j|=rs+«=r-|j|=s ll|=i (0,t,l,j) ^^=0 
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From these we can get expansion for F{^r{Pn) by the relationship 

t+<i=r j>i,|j|=t \\\=s (0,s,l,j-i) v=f) 

+ E E ^(j;o)(^JE E i{{K)r^v\ 

t+s=r+l j>i,|j|=t |l|=s (0,s,l,j-i) i'=0 

+ E E 4;o)(^n)E E i{{K)n.^- 

t+s=r+2 j>i,|j|=t \\\=s (0,s,l,j-i) ''=0 

With these Fi_r(/3n)'s we can get the expansion for 

r r+2 v+2 p'^nlh \ 

iv,.(/3„)=i:nE E ns- 

/2(r-,|i|)f=l|j|=2/i(2,«+2,|k„|,i„)«=2 ''^■^■i-' 

Also, note in this case 

|i|=2 

^E--^/^E E E4;j)E E ri(h;")/ij. 

r=0 |i|=2s+i=r- |j|=s |l|=t (0,t,I,j) ^'=0 

Then we can get the expansion for Z„(q: + q:^,/3„)/Z„(q:^,/3„). Going through 
the remaining part in the proof of Theorem 2.3, we will get two correspond- 
ing relationships of (A. 10) and (A. 12) for g,.; Taking these together, as in 

the proof of Theorem 2.2, we get ~ (1 + 0{n-^''^))l{Q.'^ - al,p'„^ - PJ' , 

and thus (q;^,^„)' ~ (q!„,/9„)'. Other proof details can be similarly obtained 
and are omitted. 

As an alternative simple, but not rigorous, justification in the proof of 
Theorem 2.3, replace p;, a and I with {pj ,0')', ai and In, set n{6) = 
7r(Q;)7r(/3), with 7r(/3) being constant, and W{d, 6) = W{di,a)V{d2, f3), with 
V{-, •) be the 0-1 loss. Note V(^) = a.e. (/3). Then, find the expression for 
Mj^r = (Mj"'^, O')' as the way to the end of the proof in that theorem. □ 

Proof of the Fact, (i) is a special case of (ii) with Pi = (|i| = 0, 1). 

(ii) The key is to find out the set (0, s, 1, i) for given 1 and i. It is empty 
if 1 7^ and i = 0. For Hi, s + 1 = 1 we must have {s,t) = (0, 1) or (1,0). 
If {s,t) = (0, 1), for the first term i = 1 = and the set (0,0,1, i) = {io = 
0}, so the first term is I~^Pq. For the second term, |i| = 1, so i = for 
some j, 1 = and the set (0,0,1, i) = {io = ej}, and so the second term is 
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1^1 J2j Ae^. {Hi') = I^^ E|i|=i Ai(Hj,). For the third term, |i| = 2, 1 = and 
the set (0, 0, 1, i) = {io = i}; this term is E\i\=2 Ei(Hj)) /i!. If (s, t) = (1, 0), 
the summation X]|i|=-i the first term is empty, and the set (0,1,1,1) in 
the second term is empty. Also, since t = 0, the summation J2\i\=t+i,t>o 
the third term is empty. These give the expression for Hi. 

For H2, the case {s,t) = (2,0) corresponds to empty summations. So we 
only consider {s,t) = (0,2) or (1, 1). When {s,t) = (0,2), for the first term, 
|i| = 1, 1 = and the set (0, 0,1, 1) = {Iq = 1}; this term is S|i|=i Pi(Ho). 
For the second term we have 1 = 2ej for some j or 1 = e^- + e; for some j 
Since 1 = 0, the set (0, 0, 1, i) = {Iq = 1} and this term is Z]|i|=2 Ai(HQ)/i!. 
For the third term, |i| = 3, 1 = and (0,0,1,1) = {Iq = 1}, resulting in 
I"^E|i|=3Ei(Hj))/i!. When {s,t) = (1,1), for the first term, 1 = 0, |1| = 1 
and the set (0,1,1,1) is empty. For the second term, we have |1| = |1| = 1. 
If 1 / 1, (0, 1, 1, 1) is empty. If 1 = 1, (0, 1, 1, 1) = {(Iq, li) = (0, 1)}, this term 
is E|i|=i Ai(Hi). For the third term, |i| = 2 and so it has the form 
i = Bj + 6; for some j,l and |1| = 1. It is easily checked that if 1 = or 
e;, (0,1,1,1) = {(lo,li) = {ej,ei) or (e;,ej)}, otherwise (0,1,1,1) is empty. So 
this term is I"! E'z=i Ee,+e, ((H^^Ht') + (H^'Ht^ ))/2. 

Note for d = 1, it is easy to see that Hq, Hi and Gi (for a = 2) above 
coincides with those corresponding on page 496 in [17]. /12 there has two extra 
terms piE^/I^ + P1A2/P . These two terms come from pi E/i (0,1,1,0) 11^=0 Kj /^v^- 
in his formula. Obviously, /i(0, 1, 1,0) is an empty set by definition. So these 
extra terms should not be there. 

(iii) By Theorem 2.3, Gi = Hi + Qi, Qi = Mo,i, and it is easily checked 

that Mo,i = ({cr(a)})-^E|i|=3M,i*i, and note *i = *|°''(0). To evaluate 
A^'i^i, for |1| =3, note 

/2(l,i)/i(2,3,ki,n)|j|=2 "J-^J-'' 

To get 12(151)5 we first find the corresponding /o(l,l,l) = lJj.>o/i(l, 1, l,r). 
It is easy to see that /i(l, 1,1,1) = {1} and /i(l,l,l,r) is empty for r 7^ 
1, so /o(l,l,l,) ={n =1}, and 12(1,1) ={ki:ki=l,2ii<|ki|<3n} = 
{ki : ki = 1}. It is easy to see that /i(2, 3, ki, ii) = Ii(2, 3, 1, 1) = {nj : Ejj|=2 J'^-j — 
1, Ejj|=2^j ~ There are d? of uj's with |j| = 2, and of uj's with |j| = 3, 
but only one of them can be 1; the rest are zeros, so /i(2, 3, 1, 1) = {uj : Uj = 
l,nj =0, for j / 1, 2 < |j| < 3}. These give, for |1| = 3, A^i^i = -Sfi. Refer to 
the definition of Fi r in Lemma 1; since |1| =3, it is easy to see that the 
first two terms in F{ i are zeros as they are summations over empty sets. 
Also, |1| =3, the constraints t > 3, j > 1 and |j| = t gives s = 0, j = 1 and 
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(0, 1, j — i) = {io = 0}, and so Fj^i = Si. Thus 

Gi = Hi + Mo,i = Hi + ({<T(a)}I)-i ^ 

|i|=3 

G2 = H2 + Q2, Q2 = Mo,2 + E|i|=iMi,iE|i|=iE(i,i,i,i)(Qi)/i! = Mo,2 + 
E|i|=i Mi,i(Qi)/i!. Note Mo,2 = ({^(a)}!)-! Eie(2,6> M,2*i = ({^(a)}!)-! x 
(E|i|=3^i,2*i + E|i|=5^^i,2*i) and, for |i| = 1, Mi,i = ({^(a)}I)"i x 
E|j|=3A^j-i,i*j-i-For |i| = 3, 

/2(2,i)f=l/i(2,«+2,k„,i„) |j|=2 J'^-''^ 

For /2(2,i), the corresponding /o(l, 2, 2) = U^>o /i(l, 2, 2, r) = /i(l, 2, 2, 1) = 
{(ii,i2) = (0,1)}, so l2(2,i) = {(ki,k2) :ki +k2 = i, |ki| = 0,2 < |k2| < 4} = 
{(ki, ks) : ki = 0, k2 = i}, /i(2, 3, ki, ii) = /i(2, 3, 0, 0) = {^/j : ^xj = 0, 2 < |j| < 
3} and /i(2,4,k2,i2) = /i(2,4,i,l) = {uj : E|j|=2 J^^j = E|j|=2«j = 1} = 
{uj :ui = l,iij = 0, for j 7^ i, 2 < |j| < 4}. Also, it can be checked that, for 
|i| = 3, Fi,2 = 5i + E|j|=i'?i+j(HJo). RecaU = I-^Ai^o- These give, for 
|i|=3, 

V |j|=i / 

For |i| = 5, 12(2, i) is empty, so A^i^2 = 0. (In [17], the set 12(2, 5) is also 
empty; but there A'^5^2 7^ and we regard this as a mistake.) Similarly, for 
|i| = 1 and |j| = 3 with j > i, 

iVj-i,i = Fj„i,i/(j - i)! = + Ao)') j . 

From these we get the expression for G2. 

(iv) By the above results, the first di components of Ti is ti = I^^Pq + 
qi, qi = mo,i, which is the (ii -dimensional version of Mo,i. So mo,i = 
({cr(ail^^)})~^ X E|i|=3 *['^i^/i'> and and are the corresponding di- 
dimensional versions of their counterparts. □ 

Proof of the Proposition, (i) Let Ai and Aq be the weak hmits of 
Ai and Aq. It is easy to see that Ai ~ A^(0, Ji), with J; = Ee^^ [Li(xi|0o)L-(xi|0o 
EiE;, Ao ~iV(0,I-^), and Ai and Aq are jointly normal with covariance 
matrix, for 1 = 6^, £;0o[Li(xi|0o)Lo(xi|^o)] -EiEg = £;0jLi(xi|0o)Lo(xi|0o)] = 
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Dj. Thus Ai|Ao ~ iV(DjIAo,J - DjID^.)> and ((I-^Ao)') = jI^^Ao = 
AoI-\ so 

^0„(Ai((I-iAo)'))=^0o[i?0o(Ai|Ao)((I-iAo)')] 

= Ee, [D,-IAo((I-i Ao)')] = Ee,, [D^IAojI^^ Aq] 
= D,Ii?0„[AoA;]l7i = 0,11-1171 = D,I7^ 
Similarly, for |i| = 2, i = + for some Thus, 

Ee,{{I-'Aoy)=Ee,{{d-'^o){jl~'^oy) 

= a-'Ee,iAoK)lf=iI-^I~^Ij\ 

and now the result follows using Fact (i) and taking the corresponding sum- 
mations. 

(ii) Note Ee^iGi) = Ee^iUl) + I^Vo + Mo,i, and the result follows. 

(iii) The proof is similar and is omitted. □ 

Acknowledgments. I am grateful to the reviewers for their helpful sug- 
gestions and comments, especially to Guang Cheng for the topic on objective 
prior. 
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